15 research outputs found

    Necessary and Sufficient Conditions for Novel Word Detection in Separable Topic Models

    Full text link
    The simplicial condition and other stronger conditions that imply it have recently played a central role in developing polynomial time algorithms with provable asymptotic consistency and sample complexity guarantees for topic estimation in separable topic models. Of these algorithms, those that rely solely on the simplicial condition are impractical while the practical ones need stronger conditions. In this paper, we demonstrate, for the first time, that the simplicial condition is a fundamental, algorithm-independent, information-theoretic necessary condition for consistent separable topic estimation. Furthermore, under solely the simplicial condition, we present a practical quadratic-complexity algorithm based on random projections which consistently detects all novel words of all topics using only up to second-order empirical word moments. This algorithm is amenable to distributed implementation making it attractive for 'big-data' scenarios involving a network of large distributed databases

    A New Geometric Approach to Latent Topic Modeling and Discovery

    Full text link
    A new geometrically-motivated algorithm for nonnegative matrix factorization is developed and applied to the discovery of latent "topics" for text and image "document" corpora. The algorithm is based on robustly finding and clustering extreme points of empirical cross-document word-frequencies that correspond to novel "words" unique to each topic. In contrast to related approaches that are based on solving non-convex optimization problems using suboptimal approximations, locally-optimal methods, or heuristics, the new algorithm is convex, has polynomial complexity, and has competitive qualitative and quantitative performance compared to the current state-of-the-art approaches on synthetic and real-world datasets.Comment: This paper was submitted to the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2013 on November 30, 201

    Data-analysis strategies for image-based cell profiling

    Get PDF
    Image-based cell profiling is a high-throughput strategy for the quantification of phenotypic differences among a variety of cell populations. It paves the way to studying biological systems on a large scale by using chemical and genetic perturbations. The general workflow for this technology involves image acquisition with high-throughput microscopy systems and subsequent image processing and analysis. Here, we introduce the steps required to create high-quality image-based (i.e., morphological) profiles from a collection of microscopy images. We recommend techniques that have proven useful in each stage of the data analysis process, on the basis of the experience of 20 laboratories worldwide that are refining their image-based cell-profiling methodologies in pursuit of biological discovery. The recommended techniques cover alternatives that may suit various biological goals, experimental designs, and laboratories' preferences.Peer reviewe

    Estimation of nitrogen content in cucumber plant (Cucumis sativus L.) leaves using hyperspectral imaging data with neural network and partial least squares regressions

    Get PDF
    Producción CientíficaIn recent years, farmers have often mistakenly resorted to overuse of chemical fertilizers to increase crop yield. However, excessive consumption of fertilizers might lead to severe food poisoning. If nutritional deficiencies are detected early, it can help farmers to design better fertigation practices before the problem becomes unsolvable. The aim of this study is to predict the amount of nitrogen (N) content in cucumber (Cucumis sativus L., var. Super Arshiya-F1) plant leaves using hyperspectral imaging (HSI) techniques and three different regression methods: a hybrid artificial neural networks-particle swarm optimization (ANN-PSO); partial least squares regression (PLSR); and unidimensional deep learning convolutional neural networks (CNN). Cucumber plant seeds were planted in 20 different pots. After growing the plants, pots were categorized and three levels of nitrogen overdose were applied to each category: 30%, 60% and 90% excesses, called N30%, N60%, N90%, respectively. HSI images of plant leaves were captured before and after the application of nitrogen excess. A prediction regression model was developed for each individual category. Results showed that mean regression coefficients (R) for ANN-PSO were inside 0.937–0.965, PLSR 0.975–0.997, and CNN 0.965–0.985 ranges, test set. We conclude that regression models have a remarkable ability to accurately predict the amount of nitrogen content in cucumber plants from hyperspectral leaf images in a non-destructive way, being PLSR slightly ahead of CNN and ANN-PSO methods.Ministerio de Ciencia, Innovación y Universidades - Agencia Estatal de Investigación - Fondo Europeo de Desarrollo Regional (grants RTI2018-098958-B-I00 and RTI2018-098156-B-C53

    Topic Discovery through Data Dependent and Random Projections

    No full text
    We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms with provable guarantees based on data-dependent and random projections to identify novel words and associated topics. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-ofart, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and realworld datasets to demonstrate qualitative and quantitative merits of our scheme. 1

    From Harvest to Market: Non-Destructive Bruise Detection in Kiwifruit Using Convolutional Neural Networks and Hyperspectral Imaging

    No full text
    Fruit is often bruised during picking, transportation, and packaging, which is an important post-harvest issue especially when dealing with fresh fruit. This paper is aimed at the early, automatic, and non-destructive ternary (three-class) detection and classification of bruises in kiwifruit based on local spatio-spectral near-infrared (NIR) hyperspectral (HSI) imaging. For this purpose, kiwifruit samples were hand-picked under two ripening stages, either one week (7 days) before optimal ripening (unripe) or at the optimal ripening time instant (ripe). A total of 408 kiwi fruit, i.e., 204 kiwifruits for the ripe stage and 204 kiwifruit for the unripe stage, were harvested. For each stage, three classes were considered (68 samples per class). First, 136 HSI images of all undamaged (healthy) fruit samples, under the two different ripening categories (either unripe or ripe) were acquired. Next, bruising was artificially induced on the 272 fruits under the impact of a metal ball to generate the corresponding bruised fruit HSI image samples. Then, the HSI images of all bruised fruit samples were captured either 8 (Bruised-1) or 16 h (Bruised-2) after the damage was produced, generating a grand total of 408 HSI kiwifruit imaging samples. Automatic 3D-convolutional neural network (3D-CNN) and 2D-CNN classifiers based on PreActResNet and GoogLeNet models were used to analyze the HSI input data. The results showed that the detection of bruising conditions in the case of the unripe fruit is a bit easier than that for its ripe counterpart. The correct classification rate (CCR) of 3D-CNN-PreActResNet and 3D-CNN-GoogLeNet for unripe fruit was 98% and 96%, respectively, over the test set. At the same time, the CCRs of 3D-CNN-PreActResNet and 3D-CNN-GoogLeNet for ripe fruit were both 86%, computed over the test set. On the other hand, the CCRs of 2D-CNN-PreActResNet and 2D-CNN-GoogLeNet for unripe fruit were 96 and 95%, while for ripe fruit, the CCRs were 91% and 98%, respectively, computed over the test set, implying that early detection of the bruising area on HSI imaging was consistently more accurate in the unripe fruit case as compared to its ripe counterpart, with an exception made for the 2D-CNN GoogLeNet classifier which showed opposite behavior
    corecore